Skip to content

rewrite prop<-() in C#396

Merged
t-kalinowski merged 21 commits intomainfrom
rewrite-prop<--in-c
Jun 12, 2024
Merged

rewrite prop<-() in C#396
t-kalinowski merged 21 commits intomainfrom
rewrite-prop<--in-c

Conversation

@t-kalinowski
Copy link
Member

@t-kalinowski t-kalinowski commented Dec 18, 2023

This PR moves prop<- to C. As part of this, there are some changes in behavior that are worth highlighting.

  • The mechanism by which prop<- avoids infinitely recursing on custom setters is different now.

    • Previously, it would keep track of the last called setter() prop name in a global variable, and use that to avoid calling a setter on another property of the same name. This failed if there were two custom setters calling each other, or if a setter set a property by the same name on a different object.
    • Now, prop<- sets attr(object, ".setting_prop") <- pairlist(prop_name) before calling the custom setter. In the case of recursive setters, the prop_names are appended to the pairlist. This attribute is consulted by prop<- when deciding whether to call a custom setter.
  • Previously, validate(object) was not called if a custom setter() was invoked. This I think was a bug/regression in prop<-, and is now fixed. validate(object) is called on the object returned by the setter(), if prop<- was called with check = TRUE (the default).

@t-kalinowski t-kalinowski changed the base branch from main to rewrite-prop-in-c December 18, 2023 17:31
@t-kalinowski

This comment was marked as outdated.

Base automatically changed from rewrite-prop-in-c to main December 18, 2023 18:44
Comment on lines +334 to +338
cat(sprintf('setting @a <- "%s"\n', a_value))
self@a <- a_value

cat(sprintf('setting @b <- "%s"\n', b_value))
self@b <- b_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to do something like:

status <- character()

...

status <<- c(status, sprintf('setting @a <- "%s"\n', a_value))

and then expect_equal() it?

(Thinking about that validation snapshot test above which clearly changed at some point without me noticing it)

Copy link
Member Author

@t-kalinowski t-kalinowski Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to revisit this test after the next OOP-WG meeting, to get confirmation that the current behavior is what we want before expanding the tests.

Some questions for the next meeting:

  • Should a constructor invoke custom property setters when constructing an instance, if no value for the property was provided in the constructor call?
  • Should a constructor set default properties on an instance (properties with custom setters or otherwise), or should prop() resolve default values from the object parent class as needed?
  • Is "properties are attributes" stable, or do we anticipate storing properties as something more performant in the future? If it's not stable yet, should we offer an "official" api that class authors can use if they want to bypasses invoking a property setter from an internal method (i.e., another property setter)?

Copy link
Member Author

@t-kalinowski t-kalinowski Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOP-WG 2024-02-12 meeting discussion:

  • Agreement that a custom property setter should not be called during instance initialization.

  • Agreement that we should fetch a default property value from the parent class instead of requiring the default property attribute be attached to each instance, to allow for more light-weight instances.

  • Agreement that "properties are attributes" is a stable implementation detail, and that attr<- is the escape-hatch for advanced users that "know what they're doing" that want to set a property value without invoking the property setter.

  • Agreement that validate() should be called after prop<-()

  • No objections to the new approach for avoiding recursion in custom property setters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much more work do you want to do in this PR before merging?

Copy link
Member Author

@t-kalinowski t-kalinowski Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good to merge (pending review from @DavisVaughan).
We'll want the following in another PR:

  • update @ to be able to fetch default property value from the parent,
  • update the class constructor so that it:
    • doesn't invoke custom property setters on object instantiation.
    • doesn't attach default property values to instances.
  • (maybe address some of the issues raised in External S7 classes #341 while we're touching that part of the code, w.r.t. the class instance constructors signature)
  • update/augment some of snapshot tests, so they're not just snapshot tests (e.g., call expect_equal() or similar), to catch future regressions on validate()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@t-kalinowski
Copy link
Member Author

t-kalinowski commented Dec 19, 2023

Some benchmarks. The new prop<- C implementation is faster in all situations, but the magnitude of the speedup varies. The total time is dominated by prop_validate() and validate(), which are still implemented in R. Moving prop_validate() to C would be the next step.

library(S7)
`propr<-` <- S7:::`propr<-`
`prop<-` <- S7::`prop<-`

Simplest case

foo <- new_class("foo", properties = list(xyz = class_double))
obj <- foo(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "xyz") <- 123,
  r = propr(obj, "xyz") <- 123,
  c_no_val = prop(obj, "xyz", check = FALSE) <- 123,
  r_no_val = propr(obj, "xyz", check = FALSE) <- 123
))))
#> # A tibble: 4 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 c            14.1µs  15.38µs    62945.    4.07KB     37.8  9994     6
#> 2 r           17.47µs  18.33µs    53961.    6.28KB     37.8  9993     7
#> 3 c_no_val   738.01ns 820.03ns  1122887.        0B    112.   9999     1
#> 4 r_no_val     3.77µs   4.14µs   236966.        0B     23.7  9999     1
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>
#> Loading required namespace: tidyr

Simple custom setter

foo2 <- new_class("foo2", properties = list(
  x = new_property(
    class_double,
    setter = function(self, value) {
      self@x <- as.double(value)
      self
      })
  ))

obj <- foo2("123")
print(plot(print(df <- bench::mark(
  c = prop(obj, "x") <- 456,
  r = propr(obj, "x") <- 456,
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c          16.65µs 17.71µs    54935.    22.2KB     27.5  9995     5    181.9ms
#> 2 r          22.39µs 23.66µs    41859.        0B     29.3  9993     7    238.7ms
#> 3 c_no_val    6.19µs  6.85µs   141648.        0B     28.3  9998     2     70.6ms
#> 4 r_no_val   22.26µs 23.82µs    41316.        0B     28.9  9993     7    241.9ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Many (100) attributes

foo3 <- new_class("foo3", properties =
  lapply(1:100, \(n) new_property(name = paste0("prop_", n), class_double)),
)

obj <- foo3(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "prop_50") <- 456,
  r = propr(obj, "prop_50") <- 456,
  c_no_val = prop(obj, "prop_50", check = FALSE) <- 456,
  r_no_val = propr(obj, "prop_50", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c          17.18µs 18.08µs    53846.        0B     37.7  9993     7    185.6ms
#> 2 r          19.15µs 20.01µs    48047.        0B     28.8  9994     6      208ms
#> 3 c_no_val    2.09µs  2.54µs   381440.        0B     38.1  9999     1     26.2ms
#> 4 r_no_val    5.29µs  6.11µs   154421.        0B     30.9  9998     2     64.7ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Recursive custom property setters()

foo4 <- new_class("foo4",
  properties = list(
    x = new_property(setter = function(self, value) {
      prop(self, "x") <- 1
      prop(self, "y") <- value + 1
      self
    }),
    y = new_property(setter = function(self, value) {
      prop(self, "y") <- 2
      prop(self, "z") <- as.integer(value + 1)
      self
    }),
    z = new_property(class_integer)
  )
  # validator = function(self) {NULL}
)

obj <- foo4(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "x") <- 456,
  r = propr(obj, "x") <- 456,
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c            22.7µs 23.9µs    41038.    29.8KB     28.7  9993     7      244ms
#> 2 r            45.7µs   48µs    20464.        0B     29.8  9620    14      470ms
#> 3 c_no_val       12µs 12.9µs    75847.        0B     30.4  9996     4      132ms
#> 4 r_no_val     45.6µs 48.6µs    20057.        0B     29.4  9557    14      476ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Recursive custom property setters(), without validation

## same as foo4 (recursive setters), but internal prop<- calls have check=FALSE
foo4.1 <- new_class("foo4.1",
  properties = list(
    x = new_property(setter = function(self, value) {
      prop(self, "x", check = FALSE) <- 1
      prop(self, "y", check = FALSE) <- value + 1
      self
    }),
    y = new_property(setter = function(self, value) {
      prop(self, "y", check = FALSE) <- 2
      prop(self, "z", check = FALSE) <- as.integer(value + 1)
      self
    }),
    z = new_property(class_integer)
  )
)
obj <- foo4.1(123)
print(plot(print(df <- bench::mark(
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 2 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c_no_val     4.1µs  4.55µs   214221.    30.2KB     21.4  9999     1     46.7ms
#> 2 r_no_val    11.1µs 11.93µs    82421.        0B     33.0  9996     4    121.3ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Created on 2023-12-19 with reprex v2.0.2

@t-kalinowski t-kalinowski marked this pull request as ready for review December 19, 2023 16:04
@hadley
Copy link
Member

hadley commented Dec 19, 2023

Looks like a nice set of speedups!

Copy link
Collaborator

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't pay too much attention to the recursion bits, as I'm a bit too far away from the code mentally to fully understand that part right now

t-kalinowski and others added 6 commits February 16, 2024 11:31
Co-authored-by: Davis Vaughan <davis@posit.co>
Co-authored-by: Davis Vaughan <davis@posit.co>
Copy link
Member Author

@t-kalinowski t-kalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DavisVaughan for the thorough review! Much appreciated.

@t-kalinowski
Copy link
Member Author

@DavisVaughan (or others), do you have thoughts on the usage of Rf_shallow_duplicate() vs Rf_duplicate() here? (Is it safe to use Rf_shallow_duplicate() here? Any gotchas I may be missing?)

@DavisVaughan
Copy link
Collaborator

DavisVaughan commented Feb 20, 2024

Rf_shallow_duplicate() seems fine. IIRC it is mostly useful for lists or objects with lots of attributes where you want to duplicate the "core" object but you don't want to duplicate the list elements or attributes (like cloning a data frame!). Its the deep = false path of duplicate1():
https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L276

Note that Rf_shallow_duplicate() still makes a full copy of, say, an integer vector, if it is the core object.

R_shallow_duplicate_attr() might also be worth looking at. It will make a lightweight ALTREP wrapper over the original object where the ALTREP wrapper is allowed to have a different set of attributes from the original, but otherwise has the same "core" object (i.e. it won't copy the integer vector in my example above). It looks like it may actually make sense to use it here, since you are just changing attributes?
https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L594C6-L594C30

Note that the object itself has to be "large enough" before an altrep wrapper is considered, otherwise its not worth it:
https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L576

@t-kalinowski
Copy link
Member Author

Thanks @DavisVaughan, I agree that R_shallow_duplicate_attr seems like a good choice here. However, it is not (yet?) part of the official API (at least, it's not listed here: https://yutannihilation.github.io/R-fun-API/). I've added a comment as a reminder to update usage once R_shallow_duplicate_attr() is safe to use.

I think this PR is ready to merge.

@hadley
Copy link
Member

hadley commented Jun 12, 2024

I'm working on the build failures in #407

@hadley
Copy link
Member

hadley commented Jun 12, 2024

@t-kalinowski this warrants a news bullets I think, and then feel free to merge.

@t-kalinowski t-kalinowski merged commit 240badc into main Jun 12, 2024
@t-kalinowski t-kalinowski deleted the rewrite-prop<--in-c branch June 12, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants